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Abstract 
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1 Introduction 


Monte Carlo (MC) calculations in lattice QCD with dynamical fermions are notoriously time con¬ 
suming. These simulations generally proceed through a numerical realization of an ergodic Markov 
process having the the desired lattice QCD probability distribution as its fixed point. In direct ap¬ 
proaches, the major stumbling block is the evaluation of the fermion determinant which is typically 
needed somewhere in the process. For interesting volumes V, the fermion matrix is extremely high 
dimensional and the time to compute the determinant exactly scales as V 3 . Hence computing the 
fermion determinant exactly is not a feasible option. 

The current standard workhorse for dynamical lattice QCD computations is the Hybrid Monte 
Carlo (HMC) algorithm pTp . In this case the problem of evaluating the fermion determinant is 
sidestepped by expressing the determinant as an integral over bosonic (pseudo-fermion) fields 
which become full-fledged dynamical fields in the Markov process. One criticism of the HMC 
method is its supposed inability to deal with an odd number of fermion flavors. Indeed, the natural 
settings for HMC are even-flavor theories where the pseudo-fermion heatbath is straightforward 
and the bosonized action is manifestly positive. However, this limitation is not fundamental and 
can be addressed within the framework of molecular dynamics algorithms [3j. This is a topic of 
current research. 

Even though the direct simulation of the fermion determinant is infamous for being nearly im¬ 
possible to implement, it promises distinct advantages over the pseudo-fermion method employed 
in HMC. Besides being able to accommodate any number of flavors, it has the potential of being 
a viable finite density algorithm in the canonical ensemble approach. The usual finite chemical 
potential algorithm in the grand canonical ensemble has the well-known sign problem and the 
imaginary chemical potential approach has the overlap problem |[4|] . Considering the canonical en¬ 
semble instead, one can project out a definite baryon number from the fermion determinant before 
the accept/reject step to stay in a given baryon number sector so that the overlap problem can be 
avoided [|5|]. In this case, it is essential to have an algorithm which accommodates the determinant 
directly. 

An interesting proposal for simulating the determinant directly has been put forward recently in 
Ref. [^]] . In that approach the idea was to split the determinant into infrared and ultraviolet parts and 
treat the infrared part exactly and the ultraviolet part approximately. This can in principle be turned 
into an exact algorithm [0], but it is not yet clear how well the systematic error of the splitting of 
the determinant was under control, particularly for small quark masses and large lattices. 

The approach that will be followed here has several roots. One important ingredient is an efficient 
evaluation of the determinant based on Pade-Z 2 stochastic estimators of the trace of of logarithm 
of the fermion matrix |^|]. For example, using the unbiased subtraction, one can reduce the error 
on the trace log of the Wilson fermion matrix on 8 3 x 12 lattice at (5 = 5.6 by a factor of 25-40 
relative to unsubtracted one with negligible overhead. At k — 0.154 with 400 Z 2 noise vectors, the 
absolute error of the trace log M is about 0.29, which translates into the same relative error for the 
determinant. 

Nevertheless, this would still not be good enough if one intended to develop a Metropolis [|9|] like 
algorithm, because the acceptance probability has to be evaluated exactly. To address this prob- 
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lem, Kennedy and Kuti (KK) proposed an algorithm in which the nonlinear Metropolis acceptance 
step was replaced with a linear one [[f(J|. This opened up the possibility of using unbiased noisy 
estimators for the required probability ratios instead of having to evaluate them exactly. Indeed, 
the required unbiased estimators can be developed based on the idea of stochastic series summa¬ 
tion [jljj. However, the quantity used as the KK linear acceptance probability can in principle be 
negative or greater than one [] when the noisy estimate comes from the outlying tails of the under¬ 
lying distribution. This introduces a bias in the results but the authors of Refs. [ [TOj , |TT| ] argue that 
in practice it is possible to tune both the expression for the linear acceptance probability and the 
estimators so that the bias is substantially smaller than the statistical errors. 


The above discussion motivates the second root of our approach which amounts to choosing 
stochastic variables so that they provide unbiased estimators for the determinant itself (rather than 
acceptance probability), which eliminates the need for the linear acceptance step, and allows these 
variables to be treated as full-fledged fields in the Markov process. This has been accomplished 
in Ref. [|T2|] and resulted in a procedure without probability bound violation problems. We will 
refer to algorithms based on this approach as Kentucky Noisy Monte Carlo (KNMC) algorithms. 
In Ref. [ |l2l l this idea was applied to a simple five state model where the amount of noise in the 
estimators could be precisely tuned. Although the statistical errors in the results of the KNMC 
method grew with increasing levels of noise, the result did remain unbiased while the bias in the 
KK procedure was substantially greater than the KNMC errors. 


Applying this approach to QCD requires not only a satisfactory way of estimating the determinant, 
but also an efficient way of proposing new configurations in the Markov process. Indeed, one can 
easily construct a useless algorithm when proposed configurations are almost always rejected. It is 
well known that changes of the gauge field constructed from sweeps guided solely by a pure gauge 
action can lead to a widely fluctuating determinant and an essentially vanishing acceptance proba¬ 
bility for small quark masses. To address this issue we adopt the idea of splitting the short-distance 
part of the determinant by the loop action and incorporating it into the pure gauge action [j7], [[3|. |T4j ]. 
This is the third ingredient of our approach. As a matter of fact, one of our points is that while 
we have only split the determinant with the simplest plaquette action, we nevertheless obtain a 
working algorithm at least for heavier quark masses. We view the inclusion of optimized higher 
loop actions for the split as being the most promising way of improving our algorithm further. 


In what follows, we present the results from applying the KNMC algorithm with the above specifics 
to two flavors of Wilson dynamical fermions. Even though the number of flavors is a mere param¬ 
eter in our approach, we use the two-flavor setting to be able to compare to HMC easily. The 
remainder of this paper is organized as follows. We begin by outlining the main ideas on which 
our algorithm is built in section |2| We then discuss the concrete application of the algorithm to 
Wilson fermions in section [3| where we discuss some of the numerical techniques used in our 
implementation as well as some work estimates. After presenting some computational details in 
section |] we discuss our assorted numerical results in sections |5j, 6] and [7. Finally we discuss these 
results in sections |8] and |9] and present our conclusions in section [I0[ 


Naturally, when this is the case the quantity in question fails to be a probability. We refer to the problem of the 
acceptance probabilities being negative or greater than one as low and high probability bound violations respectively. 
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2 The Algorithm 


We start by describing the basic ideas on which our algorithm is built. Our goal is to simulate a 
distribution given by the gauge invariant function of lattice link variables of the form 

Nf 

P QCD (P) oc e~ s ^ u) ndet M f (U) = e~ Sa{u)+ ^f^ TrlnA ^ (c/) (1) 

/=i 

where S g (U ) is the gauge action and Mf{U ) is the fermion matrix (det Mf(U) > 0) f] for a given 
flavor of dynamical quark. The indices / run over the number of flavors one wishes to simulate. 
For clarity of the discussion and notation below, we shall describe our algorithm using just a single 
flavor of fermion and drop the subscript / for now, with the understanding that the generalization 
to many flavors is straightforward. 

It will be assumed that there is a suitable approximation Rm{U ) of In M(U ) that is easy to evaluate, 
and whose accuracy can be controlled so that the corresponding distribution 

P(U) OC e-^^+Tr R m(U) (2) 


is arbitrarily close to P QCD (f/). 

We will construct an exact algorithm for P(U) of Eq. (j|) based on the following considerations: 

1. As pointed out in the introduction, the exact computation of Tr R M (U) is not feasible. For this 
reason one would like to use noisy estimators of this quantity. Let us consider 

x = E[ Tr R m (U), V ] = V*Rm(U)v (3) 

where rj is a vector in the linear space of M(U) whose elements are random numbers drawn from 
a distribution P v (r]) satisfying the property that 

(vhlj)p^v) = s ij • ( 4 ) 

In equation (Q) the subscripts on the angle brackets imply that the expectation value is to be taken 
in the measure defined by P T '(r/). In equation (|j|) the notation P[Tr R M (U),r]\ is also introduced, 
which may be used throughout this paper to indicate that a given quantity is an unbiased estimator 
for the first argument in the square brackets depending on the subsequent arguments. In this case, 
for example, x is an estimator of Tr R M (U), depending on the noise vector //. 

From equations (|3|) and (|4|) it is straightforward to show that x is indeed an estimator for Tr R M (U). 
However for simulating the measure defined by equation (2) estimates of the quantity e x are 
needed. If a sequence of estimates x t = r]}R M (U)rii , with i = 1, 2... , for x = Tr R M {U) 

2 In our general discussion we assume that determinant is positive for arbitrary gauge background. For Wilson 
fermions (unlike overlap fermions) this is not strictly satisfied but the rare encounter of negative sign can be in principle 
be monitored and taken care of by including the sign into the observables. In our case this standard strategy will be 
adopted for other reasons anyway. 
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is available to us, where the subscripts on rj now refer to the position of rj, in the sequence rather 
than its elements, we can construct an estimator for e Tr Rm <u ) by evaluating the function [jTT]] 


f(U,{r}i},{p k },c) = l + |xi + o{^-P 2 ) {'7 + 0 (^ C_P3 ) {^7 + '" 


+ ...91- 

,n 


pr 


X n 

— + ... 

c 


(5) 


where c > 0 is a tunable constant, 6 is the Heavyside step function and the pk are random number 
uniformly distributed in the range 0 < pk < 1 (in other words, pk has distribution P p (pk) = 
9{pk) — 9(pk — 1) .) One can easily verify that 


(/ (U, {Vi}, {p k }, c))^ PHvi) n « 2 PP{pk) 


= gTr R M (U) 


( 6 ) 


2. Motivated by the discussion above, and by the form of equation ( Zj), we extend the variable 
space and write the corresponding partition function in the form 

/ OO CO 

dU e- s °™ n dr h pV ( r h) n d Pk P P {Pk) f(u , V, p) , (7) 

i= 1 k =2 


where we have introduced the shorthand f(U, rj, p) for f(U, {r/, }, {/>/,}, c). We have thus intro¬ 
duced an infinite number of auxiliary variables. How can one deal with them in a practical simula¬ 
tion? The point is that given the nature of terms in Eq. (|5j) only a finite number of them will be used 
in any particular evaluation of f(U,r),p), since the series terminates stochastically. The average 
number of terms can be tuned by appropriate choice of the constant c, and if the typical values 
of Xk can be kept reasonably small during the simulation then a practical scheme with effectively 
finite number of noise fields present can be developed. 

3. The basic problem with partition function (|7|) is that fill, r/, p) is not positive definite, causing 
the well-known difficulties to standard simulation techniques. We will assume (and demonstrate 
later) that things can be arranged so that the occurrence of negative f(U, r/, p) in typical equilibrium 
configurations (U, rj, p) is very small. In that case one can cure this problem by absorbing the sign 
into the observables in the usual way, i.e. 


(O sgn(P) )|p| 
(sgn(P) )|p| 


( 8 ) 


Our goal then is to find a suitable Markov process for generating the probability distribution 


P(U,v,p) « e- s ‘W\f(U,i,,p)\ n *"(■».) II P 'M 


i =1 


k =2 


(9) 


4. One may attempt to simulate the distribution (|9]) in several possible ways. To explain the 
approach adopted here, let us introduce the collective notation £ = (//, p), f(U , £) = f(U, rp p), 
P(U, £) = P(U, rj, p). We can then write schematically P(U, £) oc Pi (f/)P 2 (f/, £)P 3 (£) with 

OO OO 

Pi(U) cx e~ s 'M P 2 (P,£) cx |/(P,£)| P 3 (£) oc n F n (Vi) II pP (Pk) ■ (10) 

i =1 k='2 
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We will use two steps based on the following two statements that can be verified directly: 

(a) Let T\ (U, U') be the ergodic Markov matrix satisfying detailed balance with respect to Pi, in 
other words P\ (U)T] (U, U')dU = Pi{U')Ti{U ', U)dU'. Then the transition matrix 


T 12 (U,U') 


T, (U, U') min 


’ P 2 (f/,0 J 


( 11 ) 


satisfies detailed balance with respect to the P 1 (f7)P 2 (^, 0 (with £ fixed), 
(b) The transition matrix 


r 23 (£,£') 


P 3 (£') min 


i Ml 
’ w,0 J 


( 12 ) 


satisfies detailed balance with respect to P 2 ((7, £)P 3 (£) (with U fixed). 

From (a), (b) it follows that Pi 2 and P 23 keep the original distribution P(U, £) invariant and inter¬ 
leaving them will lead to an ergodic Markov process with the desired fixed point. 

We note that there is a lot of freedom in choosing the pure gauge process Ti(U, U'). If local 
updates are used, then it is necessary to ensure that a given sequence of such updates satisfies 
detailed balance with respect to P\{U). This can be achieved for example by updating the sites at 
random or selecting the order of updated variables appropriately. We adopt the procedure wherein 
only links corresponding to chosen even/odd part of the lattice and chosen direction are updated. 
One can easily check that such a “sub-sweep” satisfies detailed balance for the Wilson pure gauge 
action if the elementary local updates also do so. Further, we note that in step (b) use is made of the 
fact that the probability distribution P 3 (£) for the noise can be generated directly from a heatbath. 

Finally it should be emphasized that in equations ( |TT[ ) and i|T~2) one needs to compute a ratio of the 
form: 

P2(U',0 \f(u',0\ 

P 2 (U,Z) \f(u,o\ 

where /([/,£) in Eq. (|j) is an estimator for e x . Since the quantity x is an estimator for the quantity 
Tr R m {U), it can be very large, as R M (U) is an extensive quantity. Looking at equation (Q) it can 
be seen that f(U,r),p ) can indeed give a very poor estimate of the exponential, if the x,}, are large, 
and only a few terms are taken. 

Ideally one would like to be in a situation where —1 < x\. < 0(1). Certainly when Xk < — 1, 
one faces the problem that /([/, r/,p) can become negative depending on the number of terms 
taken. If this happens only occasionally the effects can be taken into account by folding the sign of 
f(U, r /, p ) into the observable as in equation (8). However, if it happens often, it can cause a large 
effective reduction in statistics. 


While no firm upper limit has been placed on Xk we do note that the exponential function diverges 
rapidly for increasing x > 0. Given an infinite amount of statistics, the stochastic exponentiation 
technique will still give an unbiased estimator for e x . However, when x > 1 the terms in (|5|) 
have increasing absolute value, thus causing the variance of the estimators to become very large. 
Furthermore, in a Markov process such as the one described above, the evolution can potentially 
get stuck in a region of configuration space with a given number of terms (noise fields p) being 
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used to estimate f(U, 77 , p). This is because although having a large number of terms is unlikely, 
once reached with > 1 , then f(U, rj, p) will have a higher numerical value than it would with 
fewer terms (corresponding to a potential new noise field configuration p') in which case the new 
field is likely to be rejected. For this reason it is prudent in a simulation to arrange matters so that 
Xk is of 0 ( 1 ). 

The above discussion suggests that, while the approach described above theoretically leads to 
simulating the distribution ([ 2 ]), additional steps need to be taken to turn it into a practical scheme. 
We now discuss some ways that can be employed to deal with the issue of typical magnitudes and 
variances of x n below. 

2.1 Shifting the Action by a constant 

Motivated by the fact that a ratio of exponentials can be written as 

^0) 

gX g(x-xo) ’ (14) 

one notices that the fermionic action can be shifted by a constant through making the replacement: 

x{U,rj) = 7^R M (U)r] -> x(U,r],x 0 ) = R M (U)r] - x 0 . (15) 

Such a shift can move the mean of the distribution of the values of x to an arbitrary real number 
without affecting the simulation in any way. With this in mind, our main goal is to minimize the 
variance of x. 


2.2 Splitting with the Loop Action 


It is well known that a significant portion of Tr In M(U) can be typically taken into account by a 


short-distance loop action A S g (U) r |13l [14fl, especially at larger quark masses, and this is expected 
to remain true for Tr Rm(U ) also. This fact can be used to reduce the magnitude of the fluctuations 
in x by splitting this part of Tr R M (U) into the gauge action when setting up the Markov process 
(see e.g. P]). To recall the argument let us write 


P(U) OC e -Sg(U) e Tr R M (U) = e -S g (U)+AS g (U) gTr R M (U)-AS g (U) 


( 16 ) 


We can thus replace 

Sg(U) — S g (U) - A S g (U) 


Tr R m (U) —> Tr R M (U) — AS g (U) (17) 


in our Monte Carlo procedure. Then the gauge updates are performed with the new local ac¬ 
tion, and evaluation of f(U, ip p) involves the variables x n estimating Tr R M (U) — AS g (U). The 


specifics of how to do this will be discussed in section 3.3 
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2.3 Explicit Splitting 


Utilizing the fact that e x = (e x / N ) N , one can also split Tr R M (U) directly by writing Tr R M {U) = 
J2iLi 77 Tr R m (U), and use separate noise fields for every -TTr R M (U). Since N divides Tr R M (U) 
into N pieces, each carrying 1/N flavor, we shall refer to it as the number of fractional flavors. 
Indeed, the corresponding modification of Markov process is straightforward. To see this, con¬ 
sider for simplicity the case N = 2. Originally, the simulated probability distribution was written 
schematically as P(U,£) oc Pi(U)P 2 (U,£)P 3 (£), while now we have 

P(U,C 1 , 6 ) oc 

where the P| is P 2 of equation ( [IT)| ) with x from Eq. (|j) replaced by x/N. 

In the step (a) of the MC procedure we thus have P 2 —> P 2 (U, £,i)P 2 (U, £ 2 ) with £i ,£ 2 fixed. 
There is an arbitrariness in selecting the process (b). For example if one chooses to update a 
single set of noise at a time, step (b) does not change at all, and one can choose for example the 
sequence (a), (b)^, (a), (b ) ?2 as an elementary Markov step. The only requirement here is the 
overall ergodicity. 

The main effect of explicit splitting is to scale the width of the distribution of x by the number 
of fractional flavors N. The optimal mixture of action shifting, splitting by the loop action and 
explicit splitting is a matter to be explored. 


2.4 Reducing the Variance from Noise 


While splitting the loop action as described above reduces the fluctuations in x arising from the 
fluctuation of the gauge configurations, the variance of x also receives a contribution from the noise 
fields rj since i^Rm{U)^ is used in the construction of x. Further variance reduction techniques 
can be applied to reduce this contribution. The particular technique depends on the kind of noise 
used. In the specific case when Z 2 noise is used, it has been shown [ [15| ] that all the contributions to 
the variance of Tr Rm(U ) come from off diagonal elements of Rm(U ) in which case the unbiased 
subtraction noise reduction technique of 0 is highly effective. We will present details of this 


method in section 3.2 


3 Application to Lattice QCD with Dynamical Wilson Fermions 


To demonstrate that the ideas described in the previous section can lead to a working algorithm, we 
now describe the details of the implementation of the algorithm that we used to perform simulations 
with two flavors of degenerate Wilson quarks. Although in principle both the algorithm and the 
implementation can handle an arbitrary number of flavors, the case of two degenerate flavors is 
convenient from the point of view that its results can be checked against HMC simulations. Further, 
we can also carry out some tuning using these reference simulations as we shall detail in sections 
P| and |5|. 
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( 18 ) 


We simulate the theory with the standard Wilson gauge action 

S g (U) = -^Re Tr U n 
o 

where (3 is the gauge coupling parameter. The quantity U n is obtained as usual by evaluating the 
product of link matrices around each elementary plaquette and summing the results over the whole 
lattice. After integrating out the Grassmann numbers, the effective fermion action is 

N f 

S f (U ) = - £ Tr In M(U, «/) (19) 

/=i 

where the sum is over all desired flavors, M(U. nj) is the Wilson fermion matrix 

M(U, K f ) = 1 — K f D(U) , (20) 

D{U) is the usual Wilson hopping matrix and Kj is the hopping parameter for the flavor with index 
/. In our simulations we used an approximation R M (U) to In M(U) given by a Pade approxima¬ 
tion, which we will discuss in more detail in section [T 2 |. 


3.1 Local Gauge Update 


In order to update the gauge fields, we use the quasi heatbath method []16|1 amended as described 
previously. We choose one particular parity (even or odd) of our lattice and one of the 4 space-time 
dimensions at random, and update all the links with that parity and in that direction simultaneously. 
Each such sub-sweep allows us to update | of our lattice. As outlined earlier, one is free to perform 
any number of such updates before updating the noise fields. In fact, this remains a free parameter 
(. N s ) in our code. However for the results presented here we have always used N s = 1. 


3.2 Estimating Tr Rm(U) 

In order to estimate Tr Rm(U, n) we turn to the technology described in [j 8 j]. The logarithm is 
approximated using a Pade approximation, which after a partial fraction expansion, has the form: 

N P 

In M(U, k) « R m (U) = b 0 I - £ bi ( M(U, k) + a J )" 1 ( 21 ) 

i=1 

where Np is the order of the Pade approximation, and the constants b, and c t are the Pade coeffi¬ 
cients. In our implementation we have used an 11-th order approximation whose coefficients are 
tabulated in [| 8 []. 

The traces are then estimated by evaluating bilinears of the form rf Rm{U)t). If the components of 
rj are chosen from the Z 2 group, then the contributions to the variance of these bilinears come only 
from off diagonal elements of R M (U) as discussed previously. In this case[] an effective method 

3 In this sense Z 2 noise is optimal. With other types of noise such as Gaussian noise, the variance receives con¬ 
tributions from diagonal terms which one cannot subtract off. In this case the unbiased subtraction scheme described 
here is ineffective. 
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reducing the variance is to subtract off a linear combination of traceless operators from R M (U) 
and to consider 

E[ Tr R M (U),rj\ = ( R M (U) - to^) V . ( 22 ) 

Here the O, are operators with Tr O, = 0. Clearly since the O, are traceless they do not bias our 
estimators in any way. The oj, are constants, that can be tuned a priori to minimize the fluctuations 

in 77 [Tr R M (U),rj\. 

In practice the are constructed by taking traceless terms from the hopping parameter expansion 
for These reduce the noise coming from the terms ( M(U ) + c ,) _1 in equation ([IT]). The 

terms D, D 2 , I) ' and further odd powers of D are explicitly traceless and terms which have even 
powers such as I) 4 have known traces given in terms of various loops. For example 

Tr D 4 (U) = —64Tr U n (23) 

and hence 0\ = D A {U ) + 64 Tr Un is traceless. Details of finding the traces of even powers of D 
can be found, for example, in [T7j]. In our computations we have subtracted observables involving 
D, D 2 , D 3 , D A , D 5 and D\ 

Although the parameters a \ are tunable in principle, the hopping parameter expansion for M ~ 1 (U) 
is sufficiently good for heavier quark masses, that numerically the 'jJ, are close to unity. Hence in 
our simulations we have always used uj 1 — \ for all i. 

Since in our implementation we need the sum of Rm(U ) for all flavors, we can estimate the whole 
sum using a single noise field r/. This allows us to compute all the ( M(JJ , Kf) + q/) - 1 /] for all c t 
and all flavors Kf, for a given r] using a single multiple shift] inversion [ IT). In practice we employ 
the M 3 R [ [8j. [T9;] algorithm as it is the most memory efficient, and memory was a bottleneck issue 
on our target QCDSP architecture. 

3.3 Loop Splitting Specifics 

We now turn to the details of splitting the loop action. The fermionic action for a single flavor can 
be written as 

S f = - (Tr R m (U, K f ) - A / Re Tr U n ) ~ A / Re Tr Un, (24) 

where the is a tunable parameters for that particular flavor. One can then shift the fermion action 
for each flavor as follows: 

S f (U) -> - (Tr R m (U, K f ) - A / Re Tr U n ) ■ (25) 

At this point it becomes convenient to introduce the shorthand T(U, \ f) for the quantity 

T(U, X f ) = Tr R m (U , Kf ) - \ f Re Tr U n (26) 

and to write 

Sf(U) —> —T(U, X f ) . (27) 

4 A1so known as multi-mass. 
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In order to absorb this change, the gauge action needs to be correspondingly shifted as 

S g (U) -> --Re Tr U n - \ f Re Tr U n = Re Tr U n (28) 

3 3 

with an extra shifted term for each flavor of fermion. The end result is that the gauge action 
becomes: 

B> N f 

S g (U ) = —-—ReTr U n with f3' = f3 + 3 £ A / . (29) 

3 /=i 


The need to be tuned to minimize the variance of T(U, \ f). The tuning procedure is given by 
the action matching technology of Sexton, Irving and Weingarten [[Mj |20| ]. In fact, finding A/ nin , the 
values of \f for which the fluctuations of T(U, \ f) are minimized, corresponds exactly to tuning 
a quenched simulation to a dynamical fermion one in an action matching sense. The quantity A/ rlin 
is given (see pfi|j) by the formula 


A L 


Cov(Tr Rm(U , Kf), Re Tr U n ) 
a 2 (Re Tr U a ) 


(30) 


where a 2 (Re Tr Un) is the variance of the plaquette and the quantity in the numerator is the 
standard covariance between the Tr Rm(U , kj) and the plaquette. 


We note, the action matching technology of [|20], is not limited to simply tuning the Wilson plaque¬ 
tte action, but is fairly generic. In particular, it can be used to tune the splitting of the determinant 
by actions which are linear combinations of higher order loops. 


When a preliminary reference simulation is available at the desired parameters, one can measure 
the required covariances and correlations on this data-set. Otherwise, since the tuning of [i2p] can 
be carried out in any measure, one can perform a quenched simulation, and employ a self consistent 
procedure to find A/ Tlin . 

Once A ; lin are determined, one can immediately compute (T(U, A;( lin )) which are good first esti¬ 
mates for the action shift parameters Xq, which will ensure the quantities x J ' = E[T(U, A/ llin )] — x{ 
have means of 0. These may not be the optimal shift factors Xq, since it may be desirable to have 
{xf) > 0, to minimize number of negative sign violations. 

One can then shift the xJ even further so that practically all the values of ./A are greater than 0. 
This can be achieved by defining 

4 = (T(U, X f m J) - 4-Xi . (31) 

where x\ is some factor of a(E[T(U, A„ lin )]). 

The final value x that we use in equation (jS) is then 

x = ^Y. e XXA , )\-4 (32) 

iv / 

with Xq ctS defined in equation (pT[). 

For later reference, the values and definitions of all the parameters in our implementation are 
summarized in Table |1|. 
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Parameter 

Description 

P 

Gauge Coupling 

K f 

Fermion Hopping parameter (1 per flavor) 

N v 

Number of noise vectors per estimator of E[R M (U)] 

(we use N v = 1) 

LJi 

Parameters for the reducing the noise in 
E[TrR M {U),rj\ 

(we use uji = 1 for all i) 

r 

Target fractional residual in the multiple mass inverter 
(we use r = 10~ 6 for the lightest shifted mass) 

\ min 

x f 

Loop action splitting parameters 
( 1 per flavor ). The shifted gauge coupling 

isP' = P + 3Ef X Ln 

N 

Number of fractional flavors (explicit splitting terms) 

x o 

Action shifting constants 
(1 per flavor) 

X\ 

Action shift fine tuning factor 
(we use x\ = 2) 

c 

Variance control parameter for equation (|5) 

(we use c = 1.5) 

N s 

The number of scalar sub-sweep ’hits’ 
in the gauge update algorithm 
(we used N s = 1) 


Table 1: Summary of implementation parameters 
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3.4 Work Estimates 


The cost C of the present implementation of KNMC for each accepted update of the gauge field 
and noise fields is 


C 


N,! N N exp Cu 


+ Cr 


P T L 


+ 


Ny -^exp^M 

pL 


(33) 


In Eq. ( j33j ) above, the first term represents the computational cost of updating the gauge field, and 
the second corresponds to the contribution from updating a single noise field (out of the N). Here 
TVgxp is the average number of terms in the stochastic expansion of the exponential function in Eq. 
© which is e for the case c = 1. Cm is the cost of estimating Tr R M {U) for all flavors but for only 
one noise field, C G is the cost of updating the gauge configuration U. The quantities P^ cc and Pj cc 
are the acceptance rates for the gauge and noise updates respectively. The cost C G is negligible in 
comparison with Cm which is dominated by the time to perform the multiple mass solution of the 
system (M(k) + Ci)X = q for all k and c t . 


3.5 Volume Scaling 

The cost for creating a single estimator for x is dominated by the cost of the multiple mass solve. 
This should scale linearly with the volume. The quantity x itself is expected to scale with the 
square root of the volume, since evaluating the bilinear involves a sum of random numbers over 
the volume which can be positive or negative with an equal likelihood. Hence one would expect 
the variance a 2 (x) of x to scale linearly with V and so a(x) should scale as y/V. In this case the 
number of fractional flavors needed to keep cr(x) to be 0(1) must also increase as y/V. Hence the 
total cost of the algorithm must scale at least as 0(V 3 / 2 ). 


3.6 Comparison to HMC 


Let us compare our work estimate to that of a typical HMC accepted configuration, 
involved for grows as 


Chmc — 


^VmdCf + 20] 


H 


PIL 


The work 
(34) 


where A r M o is the number of time-steps one takes while integrating the Hamiltonian equations 
of motion for one trajectory. The predominant contribution to the cost of such a time-step is the 
computation of the molecular dynamics force for the time- step, which for fermionic systems 
involves solving the system of equations: (M^M)x = (j>, where </> are the pseudo-fermion fields. 
The cost Cu is the cost of calculating the energy which also requires the solution of a system 
similar to that of the force computation. The energy calculations are done at the start and end of 
the trajectory. While in principle one can carry out the inversions for the energy using a different 
stopping criterion from the one used for the force computation, it is convenient now to consider a 
case where this is not done and Cp = C\\. 


Since the predominant cost for our KNMC algorithm (assuming that P^ cc < Pf cc ) comes from the 
accept/reject step following the gauge field update; when the determinant has to be estimated for 
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all the fractional flavors (c.f. the first term of Eq. (|33|) ) we will neglect the cost of updating a 
single noise field (where the determinant only has to be estimated for a single fractional flavor - 
c.f. the second term of Eq. (|33|). Also, as C' G , the cost of performing the gauge update sweep, is 
negligible in the current implementation in comparison to Cm, the cost of performing a multiple 
mass inversion, the cost of the noisy algorithm is approximately 


c 


KNMC 


N v N N sxp Cm 

pu 

acc 


(35) 


Comparing Eqs. ( |35| ) and ( p4| ) and assuming that Cm ~ Cy since they both involve a solution of a 
similar set linear equations we note that the two algorithms are comparable when 


N v N N exp Amd 

pKNMC ~ pHMC 

1 1 o r'r* 


(36) 


where C.[) ( NM( - refers to the gauge acceptance rate of the KNMC algorithm and P,j ( T c MC refers to the 
HMC acceptance rate. In a typical application, A M d ~ 0(100) and P™ c ~ 0.8. As we shall 
see later on, our simulations using the KNMC algorithm managed to achieve P XXMC ~ 0.3, with 
A exp ~ 3 and N m 20, which makes our current simulations somewhat more expensive than their 
HMC counterparts. 


4 Computational Details 

We now briefly describe our numerical computations. In all we have performed 5 numerical stud¬ 
ies. One of these was a reference HMC simulation, another was a brief study of the behavior of 
the stochastic exponentiation technique and the remaining three were KNMC computations. 

Our implementation of the KNMC algorithm was coded for the QCDSP | ^T|] supercomputer, and 
was run on 1, 2 and 4 motherboard QCDSP computers located at Columbia University and the 
Brookhaven National Laboratory. Our code was written in C++ utilizing the Columbia Physics 
Software System which was made available to us by the RIKEN-BNL-Columbia (RBC) Collabo¬ 
ration. 

Our reference Hybrid Monte Carlo (HMC) simulation (see below) was carried out at the T3E 
facility at NERSC, using the GHMC [ [22| ] code made available to us by the UKQCD collaboration. 

Our analysis program, as well as our investigation of the stochastic exponentiation was carried out 
on workstations. 


5 Reference HMC Simulation 

In order to carry out the required tuning, and to have some benchmark results for our noisy sim¬ 
ulations we have performed a reference HMC simulation with two flavors of Wilson dynamical 
fermions using the desired physical simulation parameters listed in Table [2j. We generated 1280 
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Parameter 

Value 

(3 

5.5 


0.1550 

N f 

2 

V 

8 4 


Table 2: Physical Parameters for our Simulations 


Observable 

Value 

(Plaquette) 
(Tr R m (U)) 

0.5476(1)(4) 
640.9(6)(20) 


Table 3: Reference HMC Results. The firs set of errors are the naive bootstrap errors. The second 
set shows the effects of autocorrelation estimated by blocking the data. 


HMC trajectories, of which the first 625 were discarded for equilibration. Of the remaining 655 
trajectories we stored every fifth one to measure Tr R M {U) giving us a total of 132 configura¬ 
tions to work with. On these configurations we have estimated Tr R M {U) using 100 noise vectors 
per configuration. When the noise fields per configuration were averaged, the measurement of 
(Tr R m (U)) v was accurate to a relative error of less than 1% per configuration. 


5.1 HMC Observables 

The values of Tr Rm(U ) and the plaquette (normalized by the volume and the number of planes) 
measured in our HMC computations are shown in Table j3|. In the case of the plaquette, we used 
the values of the observable on all 655 trajectories. The statistical errors were first estimated using 
a simple bootstrap technique with 500 bootstrap samples. A blocking technique was then used to 
estimate the effects of autocorrelation on the observables. This technique consisted of averaging 
successive values of the observable in the time series into a single observable of a new data-set 
(with less statistics than the original). The naive variance was then measured on the resulting new 
data-set. This procedure was repeated until we ran out of statistics, or observed a plateaus in the 
variance. Unfortunately, this data is rather noisy and hence estimating the plateaus is somewhat 
subjective. We believe we have been conservative in Table |3|. 


5.2 Tuning X f mhl 


We now describe the results of performing the tuning for the A{ lin . Since, both the HMC and the 
KNMC simulations were done using degenerate flavors of fermions, we will drop the flavor index 
/ on this quantity from now on. 


We used the estimators of Tr R M (U ) to estimate A min using the tuning formula of equation (|30j). 
Before outlining the results we note that there are two ways of computing the variances and co- 
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variances in equation (|3T)[), the choice of which has a bearing on the resulting standard deviation, 
°<Tof T(uXj: 


1. Method 1: In this method, all the estimators E [Tr Rm(U), //] are first averaged over all 
the noise fields r; for a given configuration. This gives an estimate of (Tr Rm)^ per configu¬ 
ration with some small error. These estimates can then be used (neglecting the small errors) 
to perform averages with respect to the gauge fields as usual when computing variances, 
covariances and correlations. The results for cr(T((7, A)) as a function of A for this method 
are plotted in Fig. [1. 

2. Method 2: In this method one does not first average over the fields. Instead the averag¬ 
ing is performed over all the noise fields and gauge fields simultaneously when evaluating 
variances, covariances and correlations. The results for a {Till, A)) are plotted in Fig. |2| for 
this method. 

While Method 1 is perhaps the preferred method from the point of view of action and observable 
matching, the numbers from it may be misleading from the point of view of a noisy algorithm 
since it neglects the effects of noise in the estimation of Tr R M (U). However, one would expect 
the two methods to both give the same A min as in effect they are both equivalent to carrying out the 
same path integral. In Method 2 since more statistics are available, one may expect to get smaller 
errors on A min . Finally comparing the results of methods 1 and 2, one can get a rough idea of how 
much of the variance in our Tr R M (U) comes from the noise fields 77 and how much comes from 
fluctuations from gauge configuration to gauge configuration. 

We note in passing, that methods 1 and 2 can be thought of as opposite extremes of carrying out 
KNMC simulations with various values of N v . Method 1 corresponds to the situation where N v is 
large, and many conventional estimators E[ Tr R M (U)- rj\ are averaged, to get a better estimator, 
whereas method 2 corresponds to the situation where N v = 1. 

Looking at Table |] it can be seen that the two methods do in fact give similar results for A rnm . 
Method 2 appears more accurate, presumably because of the larger number of estimators available. 
By examining Figs, jl] and [2] the increase in statistics is clearly visible from the size of the horizontal 
error bar on the tuned point. It can also be seen that the minima are quite shallow in terms of A. 
The error bar on the point obtained with method 1 is quite large, despite the fact that the point 
itself lies near the minimum. With method 2 the error bar is smaller and the point is better placed. 
Our recommendation from these results would be to always check that the minimum is found, by 
performing some manual tuning around the value of A min given by the Eq. (j30j). 

We note that we carried out the measurements of Method 2 after our KNMC simulations as an 
afterthought. Hence our simulations all used values determined by Method 1. 


6 Stochastic Exponentiation Study 

Before we describe our KNMC simulation results, we will make another detour and experiment 
with the technique of stochastic exponentiation. A question of interest is: How good an estimator 
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Statistic 

Method 1 

Method 2 

<r(Tr R m (U)) 

7.99 

11.51 

Corr(Tr R M (U), Tr U n ) 

0.96(12) 

0 .66(1) 

Amin (xlO- 2 ) 

3.27(38) 

3.29(5) 

a(T(U, Amin)) 

2.09 

8.56 

(T{U, Amin)) 

-679.6(1) 

-689.60(7) 


Table 4: HMC Tuning Results 



A 


Figure 1: Tuning for A m i n using Method 1. The circle gives the result from equation (ATjj). The 
squares are the results of explicitly varying A around this minimum. 
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Figure 2: Tuning for A m i n using Method 2. The circle gives the result from equation (30). The 
squares are the results of explicitly varying A around this minimum. 
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0 (x)=O.Ol 



Figure 3: Bias in the Stochastic Estimation as a function of statistics 

E[e x ] of e x can one obtain by applying equation (|5|) to estimators E[x\ of x. In this section we 
attempt to give a partial answer to this question in a situation where both x and its fluctuations, as 
characterised by its standard deviation a(x), are under explicit control. 

In this study, noisy estimates E[x] were made for several values of x by adding Gaussian noise of 
known variance a 2 (x ) to the actual values of x. Eq. @ was then applied to these values of E[x\ to 
make estimators E[e x ) of e x . 

The results of this study are shown in Fig. [3| where the bias in the results of the stochastic exponen¬ 
tiation is plotted against the number of samples of E[e x ]. To be more precise, a number of samples 
of E[e x ] were averaged to obtain a measurement of (e x ) and this was subtracted from the true value 
of e x . It can be seen from Fig. [| that the technique works quite well for x = 3, c = 1 and about 
1000 samples. Increasing c to c = 1.5 allows one to get unbiased estimates for x = 5 for the same 
number of samples and it is even possible to get unbiased estimates for x = 6 for such a sample 
size if c = 2.0. However we note that as x is increased the fluctuations increase enormously too, 
as can be seen when one compares the scales on the vertical axes of Fig. [3 . The data shown in Fig. 
|3] confirm our earlier reasoning about the distribution of x in our earlier discussions, namely that it 
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is preferable for the value x in Eq. (5) to be small. 


7 KNMC Simulations 

We now turn to the discussion of our KNMC simulations. In all, three major simulations have been 
carried out, to which we shall refer as SI, S2 and S3 respectively. In all three simulations we have 
used the same physical parameters as in our trial HMC simulation (see Table [3). We used the loop 
splitting factor A m i n = 3.27 x 1CT 2 for both flavors as obtained by Method 1 of the tuning (see 
Table |j). For our value of Xq, we used (T(U, A min )) = —679.6 (Table with an additional fine 
tuning factor of x\ = 2 as per equation (|3T|). 

Using this value of A min , resulted in a gauge coupling shift of A f3 = 3 Nf x A„ lin = 0.1962 giving 
a value of [3' = 5.6962 to use in the quenched gauge updating algorithm (instead of the /3 = 5.5 of 
the HMC computations). 

The only difference between the three KNMC simulations was the value of the number of fractional 
flavors N which took the values N = 15, N = 20 and N = 25 for simulations ST, S2 and S3 
respectively. This choice was based on the values of cr(T(U, A min )) measured in the preliminary 
HMC simulation using Method 1. 

We show some basic statistics for the simulation in Table |5]. In particular, we give the number of 
negative signs for /((/, rj, p) that we counted along each simulation and the width of the distribution 
of x as characterised by its standard deviation a(x), with x defined as in Eq. (T2). 

7.1 Distribution of x 

In Fig. [4] we plot the distributions of the quantity x as measured in the three simulations. The 
distributions appear to be Gaussian as one would expect from the Central Limit Theorem. 

It can clearly be seen, that simulation -Sl is quite near the limits prescribed upon the values of the 
quantity x by the stochastic exponentiation study, namely that the values of x are getting near the 
upper limit of x = 4, x = 5 where the stochastic exponentiation technique begins to break down 
for our limited statistics. Also for simulation ST, it can clearly be seen that the lower tail of the 
distribution stretches well beyond 0. This manifests itself in that about 2.9% of the estimators for 
E[e x ] were negative which, has a noticeable effect on the statistical errors for observables as will 
be demonstrated shortly. 

Simulation S 2 seems to be more or less where one would expect this noisy method to behave well. 
A few of the estimates for x are larger than x = 5 and although the tail of the distribution stretches 
into the negative region, in practice this results in very few sign violations of f(U , //, p) (Only 1 
out of the total number of statistics equating to 0.02%). The trick of folding the sign of / into the 
observable may be a practical proposition in this case. 

Finally S3 is the best behaved of the simulations, with few values of x > 4 and no sign violations 
in /((/, p, p). The results of this simulation can be analysed with conventional techniques. 
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Probability Probability Probability 


Simulation 

N 

# Gauge Updates 

negative signs of f(U, r], p) 

(Number, %-age) 

<r(x) 

SI 

15 

2400 

(70, 2.9) 

0.944 

S2 

20 

4229 

(1,0.023) 

0.734 

S3 

25 

4050 

(0, 0) 

0.6 


Table 5: Summary of statistics for the noisy simulations 



Figure 4: Distributions of x for the three noisy simulations 
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Simulation 

N 

Gauge Update 
Acceptance (%) 

Noise Update 
Acceptance (%) 

SI 

15 

32(1) 

49(1) 

S2 

20 

33(1) 

53(1) 

S3 

25 

33(1) 

55.7(7) 


Table 6: Acceptance Rates for the noisy simulation 

7.2 Acceptance Rates 

The acceptance rates of the three KNMC simulations are shown in Table |>L One can see that the 
gauge acceptance rate seems not to depend on the number of fractional flavors used (N), whereas 
there is a marked increase in the noise update acceptance rate when N is increased. We believe 
that being able to achieve a gauge acceptance rate of around 33% by performing quenched updates 
at a shifted (3 is a great success of the action matching technology, however, for the algorithm to 
be practical it is somewhat low. Such a low acceptance rate, combined with updating only |-th of 
the lattice gauge fields with every update can result in very long autocorrelation times (as will be 
discussed in section [774]). Clearly for the algorithm to be practicable, a better gauge update scheme 
is needed. 

7.3 Observables 

In Fig. |5j we show our measurements of the plaquette and Tr Rm(U) for the KNMC simulations 
as well as the result of the reference HMC calculation for comparison. The error estimates for 
the noisy simulations do not include the effects of autocorrelations so as not to obscure the effects 
incurred by the sign violations in f(U, r), p ). 

We note with gratification, that the results for simulation ST appear unbiased, even with 2.9% of 
the estimates of f(U,i),p) having negative signs. However the errors on this value are massive 
when compared to those of the other simulations. 

In Table 0 we show the bootstrap errors on the numerator and denominator of Eq. ([§]) used in 
evaluating the expectation value of the plaquette in the presence of sign violations (for ST since 
S3 is free of sign violations and there is only 1 single violation in S 2). In the third line we tabulate 
the relative error in the plaquette measurements when the sign is not folded in - although it must 
be borne in mind that doing the analysis this way would give a biased value for the plaquette. 

In the case of simulation SI it can clearly be seen that the magnitude of the relative errors when the 
sign is folded in is about two orders of magnitude greater than when it is not, and that the relative 
errors in the numerator and denominator (first two lines in Table 0) are approximately the same. 
This clearly suggests that the errors are entirely dominated by the error in the sign. 
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Figure 5: Observables from the KNMC simulations. The plaquette is shown on the bottom graph, 
and (Tr Rm{U )) is shown on the top. The values plotted at N = 10 are the HMC results for 
comparison. 


Simulation 

%-age sign violation 

Observable 

Relative Bootstrap Error 



(Plaquette sgn(/)) 

0.709% 

51 

2.9 

(sg n(/)> 

0.708% 



(Plaquette) 

0.0037% 


Table 7: The relative errors in the numerator and denominator of the quantity needed to estimate 
the expectation value of the plaquette for 51. The third line shows the relative error on the plaquette 
without folding in the sign. 
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Figure 6: Variance of the plaquette measurement of ,S'3 as a function of block size. 

7.4 Autocorrelations 

Let us now turn to the question of autocorrelations. It is not entirely clear how to best estimate auto¬ 
correlation effects in the presence of sign violations. When a substantial amount of sign violations 
are present, one would expect these to be the dominant contributors to the statistical error in any 
case. However, we did attempt to make an investigation into autocorrelation effects in simulation 
53 where no negative signs are present in f(U, r), p ). 

Once again, we used the blocking procedure that was outlined earlier for our HMC simulation 
(section |5). The growth of the variance of the plaquette is plotted as a function of block size in 
Fig. p. It can be seen that the errors do not plateau as a function of block size, indicating that the 
integrated autocorrelation time is very long. We expect this is due in part to the fact that only |-th 
of the lattice is updated with every gauge update, and in part because the rate of acceptance for the 
gauge updates is quite low - about 33%. 

We note that in our experience with the quasi-heatbath method in quenched simulations, the pla¬ 
quette usually decorrelates in about 20-40 sweeps (at around this level of gauge coupling and on 
similar volumes). However in this implementation of KNMC only |-th of the lattice is updated 
with any one sweep. This in itself could be expected to increase the autocorrelation time to about 
160 — 320 accepted sweeps. This would not present a problem in a conventional quenched sim¬ 
ulation where gauge field updates are cheap and no sweeps are rejected (they are after all from 
a heat-bath). However when one couples a potentially expensive noisy accept/reject after each 
|-th update the computational cost increases significantly, so that one cannot hope to achieve the 
level of statistics in quenched simulations. In this case the low acceptance rate of the simulations 
becomes a problem. Clearly for the KNMC approach to be practicable, a better gauge update 
algorithm is needed than the one used here, with a higher acceptance rate. 
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Statistic 

Method 1 

Method 2 

<x(Tr R m (U)) 

10.38 

12.73 

Corr(Tr R M (U), Tr U n ) 

0.98 

0.81 

A miri (xl0~ 2 ) 

3.51(4) 

3.526(7) 

a(T(U, Ahmc)) 

2.0 

7.49 

a(T(U, A min )) 

1.87 

7.45 


Table 8: KNMC Tuning Results for one flavor for simulation S3. 

7.5 Checking the Tuning 

Before we proceed to summarize and discuss our numerical results, we would like to discuss the 
quality of the tuning for the three KNMC simulations. On physical grounds, one would expect 
that the parameter A min which minimizes the variance of the fermion action, is a universal quan¬ 
tity, and should depend largely on the physical parameters which define the expectation value of 
Tr In The amount of variance reduction thus achieved is expected to depend on the gauge 

generation algorithm to some degree, but certainly, one would expect some self consistency when 
carrying out the tuning on the HMC and the KNMC data-sets. 

To this end we repeated the procedure for tuning A m i n using both methods 1 and 2 on estimates 
of Tr Rm{U) produced during simulation S3 which is not affected by sign violations. The main 
difference here is that the number of estimates of Tr R M (U) varied slightly from configuration to 
configuration since the number of noisy estimates for x differs for each update. However, with 
N = 25 and a value of c = 1.5 the average number of terms used in evaluating f(U,ri,p ) was 
about 65 terms per gauge update. With over 4000 updates, these statistics should prove adequate. 

The re-tuning results are shown in Table ||. We note that the value of A min has now increased a 
little with respect to the HMC results ( Table |4j), however, this is not a very large change. Indeed, 
it is less than 10% of the HMC value in Table f|. 

We show in Table |8] the effect of the newly determined A m i n on the a values as well as the a value 
from the old HMC tuning for comparison. It can be seen that the change in the value of A m ; n from 
that of the HMC result does not reduce the a value by a great deal, probably because of the very 
flat minimum of cr as a function of A. 

A similar trend can be seen, when switching to method 1 from method 2, as was visible in the 
HMC case. When averaging the noise fields and effectively measuring (Tr Rm{U )) on each con¬ 
figuration, the subtraction of the loop action from the plaquette is much more effective than when 
using method 2 - (a reduction from cr ^ 10 to cr ~ 2 in the former case against a reduction from 
cr ~ 12 to cr ~ 7.5 in the latter). 

8 Summary of Numerical Results And Discussion 
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8.1 Tuning A min 

Our main result here is that the tuning can be done in two ways (methods 1 and 2) to carry out 
the minimization of a(T(U, A)). We found that a much larger degree of noise reduction can be 
achieved by subtracting the loop action using method 1 than method 2. Some of the gain seen here 
may be spurious due to neglecting the errors from the noise when averaging the noisy estimators, 
and some of it is real and comes from the fact in method 2 extra noise is being added to the 
computation, and the minimization is not actually carried out with respect to Tr R M (U) but with 
respect to noisy estimators of it. Further, the minima thus found is very flat with respect to A (see 
Figs. 0 and fl[) implying that not much gain may be made by dynamically tuning the A parameter. 

These results seem to imply that a greater improvement may be achieved in the acceptance rates of 
the noisy algorithm using the loop-splitting technique if more noise vectors were used in the noisy 
estimators of Tr R M (U) instead of the current 1 vector per estimator (i.e. if N rj was increased.) 
However, this would also imply more numerical work as computing each noisy estimator involves 
a multi- mass inversion. On the other hand it may be possible to reduce the number of fractional 
flavors (N) in return. Further investigation is required to establish when the trade-off becomes 
worthwhile. Finally, it was found that switching to method 2 from method 1 on the noisy data-sets 
showed a behavior pattern very similar to switching between methods 1 and 2 on the HMC data, 
even if the actual values were somewhat different. 

8.2 Stochastic Exponentiation Technique 

The stochastic exponentiation technique works well when the argument x to be exponentiated is 
small and positive. When x > 1 successive terms in the expression for f(JJ,rj,p ) have greater 
and greater numerical value although the probability of reaching these terms still drops factorially. 
This implies that the variance of the estimates is likely to be large when x is large, and also that 
the estimates for the exponential are likely to be poor when only a few terms are taken. If x is 
negative, one risks getting negative values of f(U, 77 , p) which can result in large statistical errors 
(see below). 

8.3 Observables and Sign Violations 

While the expectation values of observables appear to be unbiased in or simulations, it appears that 
even a small number of negative signs in f(U, 77 , p) - such as 2.9% of the total number of estimates 
- can completely dominate the statistical errors. In this situation the effort of creating more and 
more configurations goes into reducing the error in the estimate of (sgn(/([/, r/, p)) a more difficult 
problem than the usual -7= problem of reducing the errors in the bare observables. While in the 
KK linear accept/reject approach these sign violations manifest themselves as an explicit bias in 
the result, in KNMC this bias is traded for a larger statistical error. 
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8.4 Autocorrelations 


Our final result is that our gauge updating algorithm performs rather poorly. Although updating 
the gauge and noise fields is computationally easy, these updates are now coupled to a computa¬ 
tionally very expensive accept/reject step. The updates have long autocorrelation times and a low 
acceptance. This situation needs to be seriously addressed if the algorithm is to be competitive 
with say HMC. 


9 Issues not Addressed in this study 

This study was the initial foray into the study of KNMC algorithms. There are several issues which 
have not been addressed which are also relevant to the algorithm. We outline two of these here. 


9.1 Equilibration 


In our study, we have always started our simulations from an equilibrated configuration produced 
by our preliminary HMC study. One may very well ask the question: “How would we equilibrate 
our algorithm and tune the necessary parameters if the reference simulation was not present?” We 
point to the idea outlined in [20]. The idea presented there is that one can carry out an initial 
quenched simulation, which can be used to carry out a preliminary tuning. This will provide 
amongst other things a shifted (3 value. One can then carry out a second quenched computation 
with the shifted (3 value, thus bringing the quenched configuration distribution as close to the 
intended dynamical one as possible. At this point, one can start to carry out simulations with the 
noisy algorithm, re-tuning (3 and the other parameters along the way until a self consistency is 
achieved. This is possible because the tuning in |[2D|] can be carried out in any measure. 


9.2 The question of an infinite number of noise fields 

One may be concerned that since technically an infinite number of dynamical noise fields are 
present in Eq. (|]) it is not possible to update them all. In particular, very high order terms in Eq. 
(5|) may never be reached. Thus some of these noise fields will have infinitely long autocorrelation 
times. Another way of saying this is that the KNMC algorithm may not be ergodic in its infinite 
variable state space. 

While this is a problem in principle, we do not expect it to be a problem in practice, since the 
probability of reaching the higher order terms is factorially suppressed. Because of this suppres¬ 
sion, we expect that these fields can have little effect on our partition function and that any bias in 
our results from such fields is expected to be very much smaller than statistical errors. It may be 
possible to construct operators that probe these high order terms explicitly, where the effect of long 
autocorrelations should be clearly visible. Perhaps a more relevant potential setback comes from 
the visibly long autocorrelations of our gauge update procedure. We note that in the KK approach, 
this problem does not arise, since in that case the noise is not part of the state space. On the other 
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hand, algorithms adopting the KK accept/reject step have the in-principle problem of probability 
bound violations which can introduce a bias into the answers. Hence in choosing between the two 
approaches, one has a choice of which in-principle problem one wishes to accept as the challenge. 


10 Conclusions 

We have developed a QCD implementation for the Kentucky Noisy Monte Carlo approach and 
performed an initial numerical study in the context of two flavors of dynamical Wilson fermions. 
This study was a success in several ways, most notably since we have managed to assemble all the 
necessary numerical technology required for incorporating the fermion determinant directly, for 
the first time with controlled systematic errors. The method produced results that are consistent 
with reference Hybrid Monte Carlo simulations. 

We have gained valuable insight into the necessary tuning methodology, and have learned what 
essentially drives the algorithm, notably the stochastic properties of the quantity x, which needs 
to be distributed so that it is of 0(1) and has a small variance. A large variance leads to many 
excessively large estimates in the tail of the distribution, causing the stochastic exponentiation 
technique to be inefficient. Also, on the other side of the distribution, one could get many sign 
violations which, while not introducing bias, can lead to large statistical errors. Even though the 
distribution can be made arbitrarily narrow by employing more noise fields, by using more loops 
for splitting the determinant, and by using a larger number of fractional flavors, all these come at 
the price of an increase in computational cost. 

Unfortunately, in our current implementation, the algorithm is not particularly efficient. It suffers 
from the problem of long autocorrelations and rather low acceptance rates. One possible way for 
addressing this issue could be the use molecular dynamics for updating the gauge field. Once again 
however, this improvement would come at a potentially high computational cost. 

In addition, several other improvements have been suggested for making the algorithm more effi¬ 
cient P3|], notably using the technique of eigenvalue deflation m and the use of additional noisy 
techniques [Q [26|]. both to improve the convergence of matrix inversions. 

Despite the above shortcomings in the current implementation (related mainly to the efficiency), 
we believe that our approach holds a great future promise with its capability to handle an arbitrary 
number of fermion flavors. Also, and perhaps more importantly, in combination with the projection 
of the definite baryon number from the determinant, it is the contending candidate for the future 
finite density algorithm of [^]] . 
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